Executive summary
TODO - write executive summary
Libraries
Libraries used to prepare the report.
library(knitr)
library(dplyr)
library(R.utils)
library(data.table)
library(tools)
library(stringr)
library(ggplot2)
library(plotly)
library(tidyr)
library(scales)
Data loading
Code to load datasets from compressed files stored in a specified
folder.
folder_name = "data"
csv_files <- list.files("data",
pattern = "\\.csv.gz$",
full.names = FALSE)
files_names <- file_path_sans_ext(csv_files, compression = TRUE)
for (file_name in files_names) {
assign(paste0(file_name, "_df"),
fread(file.path(
folder_name,
paste0(file_name, ".csv.gz")
)))
}
csv_files
## [1] "colors.csv.gz" "elements.csv.gz"
## [3] "inventories.csv.gz" "inventory_minifigs.csv.gz"
## [5] "inventory_parts.csv.gz" "inventory_sets.csv.gz"
## [7] "minifigs.csv.gz" "part_categories.csv.gz"
## [9] "part_relationships.csv.gz" "parts.csv.gz"
## [11] "sets.csv.gz" "themes.csv.gz"
Data intoduction
The section below presents the datasets used for analysis, including
their structure, dimensions, and basic statistics.
Total dataset size
| 1446639 |
45 |
8099232 |
0.29% |
Data structure

Datasets
summaries
Colors
Colors dataset dimensions
| 263 |
4 |
Colors dataset basic statistics
| Min. : -1.0 |
Length:263 |
Length:263 |
Length:263 |
| 1st Qu.: 83.0 |
Class :character |
Class :character |
Class :character |
| Median :1005.0 |
Mode :character |
Mode :character |
Mode :character |
| Mean : 651.4 |
|
|
|
| 3rd Qu.:1070.5 |
|
|
|
| Max. :9999.0 |
|
|
|
Head of Colors dataset
| -1 |
[Unknown] |
0033B2 |
f |
| 0 |
Black |
05131D |
f |
| 1 |
Blue |
0055BF |
f |
| 2 |
Green |
237841 |
f |
| 3 |
Dark Turquoise |
008F9B |
f |
| 4 |
Red |
C91A09 |
f |
Elements
Elements dataset dimensions
| 84138 |
4 |
Elements dataset basic statistics
| Min. : 9327 |
Length:84138 |
Min. : -1.0 |
Min. : 1001 |
| 1st Qu.: 4259774 |
Class :character |
1st Qu.: 8.0 |
1st Qu.: 18454 |
| Median : 6057754 |
Mode :character |
Median : 28.0 |
Median : 41748 |
| Mean : 5222065 |
|
Mean : 539.7 |
Mean : 45570 |
| 3rd Qu.: 6262024 |
|
3rd Qu.: 135.0 |
3rd Qu.: 75474 |
| Max. :61532443 |
|
Max. :9999.0 |
Max. :107520 |
|
|
|
NA’s :23682 |
Head of Elements dataset
| 6443403 |
2277c01pr0009 |
1 |
2277 |
| 6300211 |
67906c01 |
14 |
67908 |
| 4566309 |
2564 |
0 |
2564 |
| 4275423 |
53657 |
1004 |
53657 |
| 6194308 |
92926 |
71 |
28967 |
| 6229123 |
26561 |
4 |
26561 |
Inventories
Inventories dataset dimensions
| 37265 |
3 |
Inventories dataset basic statistics
| Min. : 1 |
Min. : 1.000 |
Length:37265 |
| 1st Qu.: 14424 |
1st Qu.: 1.000 |
Class :character |
| Median : 54379 |
Median : 1.000 |
Mode :character |
| Mean : 61104 |
Mean : 1.091 |
|
| 3rd Qu.: 88842 |
3rd Qu.: 1.000 |
|
| Max. :194312 |
Max. :16.000 |
|
Head of Inventories dataset
| 1 |
1 |
7922-1 |
| 3 |
1 |
3931-1 |
| 4 |
1 |
6942-1 |
| 15 |
1 |
5158-1 |
| 16 |
1 |
903-1 |
| 17 |
1 |
850950-1 |
Inventory
minifigs
Inventory minifigs dataset dimensions
| 20858 |
3 |
Inventory minifigs dataset basic statistics
| Min. : 3 |
Length:20858 |
Min. : 1.000 |
| 1st Qu.: 7869 |
Class :character |
1st Qu.: 1.000 |
| Median : 15681 |
Mode :character |
Median : 1.000 |
| Mean : 43010 |
|
Mean : 1.062 |
| 3rd Qu.: 66834 |
|
3rd Qu.: 1.000 |
| Max. :194312 |
|
Max. :100.000 |
Head of Inventory minifigs dataset
| 3 |
fig-001549 |
1 |
| 4 |
fig-000764 |
1 |
| 19 |
fig-000555 |
1 |
| 25 |
fig-000574 |
1 |
| 26 |
fig-000842 |
1 |
| 26 |
fig-008641 |
1 |
Inventory
parts
Inventory parts dataset dimensions
| 1180987 |
6 |
Inventory parts dataset basic statistics
| Min. : 1 |
Length:1180987 |
Min. : -1.0 |
Min. : 1.00 |
Length:1180987 |
Length:1180987 |
| 1st Qu.: 9404 |
Class :character |
1st Qu.: 4.0 |
1st Qu.: 1.00 |
Class :character |
Class :character |
| Median : 22838 |
Mode :character |
Median : 15.0 |
Median : 2.00 |
Mode :character |
Mode :character |
| Mean : 50849 |
|
Mean : 131.8 |
Mean : 3.37 |
|
|
| 3rd Qu.: 87088 |
|
3rd Qu.: 71.0 |
3rd Qu.: 4.00 |
|
|
| Max. :194312 |
|
Max. :9999.0 |
Max. :3064.00 |
|
|
Inventory sets
Inventory sets dataset dimensions
| 4358 |
3 |
Inventory sets dataset basic statistics
| Min. : 35 |
Length:4358 |
Min. : 1.000 |
| 1st Qu.: 8076 |
Class :character |
1st Qu.: 1.000 |
| Median : 16423 |
Mode :character |
Median : 1.000 |
| Mean : 52519 |
|
Mean : 1.813 |
| 3rd Qu.: 98685 |
|
3rd Qu.: 1.000 |
| Max. :191576 |
|
Max. :60.000 |
Head of Inventory sets dataset
| 35 |
75911-1 |
1 |
| 35 |
75912-1 |
1 |
| 39 |
75048-1 |
1 |
| 39 |
75053-1 |
1 |
| 50 |
4515-1 |
1 |
| 50 |
4520-1 |
2 |
Minifigs
Minifigs dataset dimensions
| 13764 |
4 |
Minifigs dataset basic statistics
| Length:13764 |
Length:13764 |
Min. : 0.000 |
Length:13764 |
| Class :character |
Class :character |
1st Qu.: 4.000 |
Class :character |
| Mode :character |
Mode :character |
Median : 4.000 |
Mode :character |
|
|
Mean : 5.296 |
|
|
|
3rd Qu.: 5.000 |
|
|
|
Max. :156.000 |
|
Part
categories
Part categories dataset dimensions
| 66 |
2 |
Part categories dataset basic statistics
| Min. : 1.00 |
Length:66 |
| 1st Qu.:19.25 |
Class :character |
| Median :35.50 |
Mode :character |
| Mean :35.36 |
|
| 3rd Qu.:51.75 |
|
| Max. :68.00 |
|
Head of Part categories dataset
| 1 |
Baseplates |
| 3 |
Bricks Sloped |
| 4 |
Duplo, Quatro and Primo |
| 5 |
Bricks Special |
| 6 |
Bricks Wedged |
| 7 |
Containers |
Part
relationships
Part relationships dataset dimensions
| 29977 |
3 |
Part relationships dataset basic statistics
| Length:29977 |
Length:29977 |
Length:29977 |
| Class :character |
Class :character |
Class :character |
| Mode :character |
Mode :character |
Mode :character |
Head of Part relationships dataset
| P |
3626cpr3662 |
3626c |
| P |
87079pr9974 |
87079 |
| P |
3960pr9971 |
3960 |
| R |
98653pr0003 |
98086pr0003 |
| R |
98653pr0003 |
98088pat0003 |
| R |
98653pr0003 |
98089pat0003 |
Parts
Parts dataset dimensions
| 52615 |
4 |
Parts dataset basic statistics
| Length:52615 |
Length:52615 |
Min. : 1.00 |
Length:52615 |
| Class :character |
Class :character |
1st Qu.:17.00 |
Class :character |
| Mode :character |
Mode :character |
Median :41.00 |
Mode :character |
|
|
Mean :38.91 |
|
|
|
3rd Qu.:60.00 |
|
|
|
Max. :68.00 |
|
Head of Parts dataset
| 003381 |
Sticker Sheet for Set 663-1 |
58 |
Plastic |
| 003383 |
Sticker Sheet for Sets 618-1, 628-2 |
58 |
Plastic |
| 003402 |
Sticker Sheet for Sets 310-3, 311-1, 312-3 |
58 |
Plastic |
| 003429 |
Sticker Sheet for Set 1550-1 |
58 |
Plastic |
| 003432 |
Sticker Sheet for Sets 357-1, 355-1, 940-1 |
58 |
Plastic |
| 003434 |
Sticker Sheet for Set 575-2, 653-1, 460-1 |
58 |
Plastic |
Sets
Sets dataset dimensions
| 21880 |
6 |
Sets dataset basic statistics
| Length:21880 |
Length:21880 |
Min. :1949 |
Min. : 1 |
Min. : 0.0 |
Length:21880 |
| Class :character |
Class :character |
1st Qu.:2001 |
1st Qu.:273 |
1st Qu.: 3.0 |
Class :character |
| Mode :character |
Mode :character |
Median :2012 |
Median :497 |
Median : 31.0 |
Mode :character |
|
|
Mean :2008 |
Mean :442 |
Mean : 161.4 |
|
|
|
3rd Qu.:2018 |
3rd Qu.:608 |
3rd Qu.: 139.0 |
|
|
|
Max. :2024 |
Max. :752 |
Max. :11695.0 |
|
Themes
Themes dataset dimensions
| 468 |
3 |
Themes dataset basic statistics
| Min. : 1.0 |
Length:468 |
Min. : 1.0 |
| 1st Qu.:250.5 |
Class :character |
1st Qu.:186.0 |
| Median :466.0 |
Mode :character |
Median :411.0 |
| Mean :433.5 |
|
Mean :360.6 |
| 3rd Qu.:625.2 |
|
3rd Qu.:512.5 |
| Max. :752.0 |
|
Max. :697.0 |
|
|
NA’s :145 |
Head of Themes dataset
| 1 |
Technic |
|
| 3 |
Competition |
1 |
| 4 |
Expert Builder |
1 |
| 16 |
RoboRiders |
1 |
| 17 |
Speed Slammers |
1 |
| 18 |
Star Wars |
1 |
Detailed analysis
Colors
Most popular colors
of parts

Distribution of
colors by transparency

Elements
Most popular
elements colors

Minifigs
Most popular number
of parts used to build minifigs

Most popular
minifigs

Most popular minifigs
| Skeleton, Standard Face, Ball Joint Arms (3626b
Head) |
 |
43 |
| Battle Droid, One Bent Arm, One Straight Arm |
 |
40 |
| Classic Spaceman, Red with Airtanks (3842a
Helmet) |
 |
39 |
| Classic Spaceman, White with Airtanks (3842a
Helmet) |
 |
37 |
| Steve |
 |
27 |
| Policeman, Black Suit with Pocket and Badge, White
Hat (3626a Head) |
 |
24 |
| Battle Droid, Two Bent Arms |
 |
21 |
| Classic Spaceman, Yellow with Airtanks (3842b
Helmet) |
 |
21 |
| Johnny Thunder (Desert) |
 |
21 |
| Chewbacca, Reddish Brown |
 |
20 |
Themes
Most popular
themes

Parts
Most populat parts
material
Most popular parts
categories
Corelation
Total number of
colors used in the sets and the year
Correlation between the total number of colors and the
year
| Year |
1.000 |
0.823 |
| Number of colors |
0.823 |
1.000 |
Total number of parts
per year and year
Correlation between total number of parts per year and year of
production
| Year |
1.000 |
0.806 |
| Total number of parts |
0.806 |
1.000 |
Total number of sets
and year
Correlation between total number of sets and year of
production
| Year |
1.000 |
0.879 |
| Total number of sets |
0.879 |
1.000 |
Trends
Number of sets over
time for most popular theme
Mean number of parts
over years for themes
Predictions /
Forcasting
TODO